2  Lab 8 Recap

We are continuing with statistical report writing which was first looked at in Lab 8. By the end of that lab, a statistical report with complete Introduction and Exploratory Analysis sections was produced, similar to the example report available on Moodle.

A typical statistical report will follow the structure below.

  1. Introduction: this is where you should write a short description of the problem and provide context such as the data and variables collected. The aims of the analysis should also be clearly stated here.

  2. Exploratory Analysis: this is where you should include appropriate summary statistics and plots you’ve created from the data. These should provide an initial impression and informally answer the aims of your analysis. Any tables and/or figures included should always be clearly captioned.

  3. Statistical Analysis: this is where the aims of the analysis will be formally answered. Describe the statistical methods and techniques you will use to analyse the data and detail any assumption you are making about the data. Include any relevant output from the analysis you’ve completed (you should not include any code from your R script in a report) and interpret the output in the context of the problem.

  4. Conclusion: use your interpretation to provide an answer to the aims of the analysis. Discuss how this formal conclusion compares to the initial impressions of the data from the Exploratory Analysis and discuss any limitations in your analysis, for example, are there any other techniques you may use in further analysis to provide a better result?

A template for a general statistical report, using the structure above, is included at the top of the Lab section on Moodle which you can always use for reference, however in this lab we will be continuing to write the report produced in Lab 8 by focusing on sections (3) and (4).

As a reminder, we are investigating a dataset which held information on the annual yield of potatoes and wheat from fields which are fertilised and fields which are unfertilised.

These data were gathered by a group of farmers who are interested in investigating whether there is a difference in their mean yield of each crop when the fertiliser is applied, compared to when it is not applied. They apply the new fertiliser to a total of 130 fields where potatoes are grown and 90 fields where wheat is grown. 80 potato fields and 120 wheat fields were left unfertilised and the annual yield of crops from each field was recorded (in tonnes).

The data are saved on Moodle and will need to be downloaded and saved somewhere accessible. You may have already done this in Lab 8.

To complete the analysis in this Lab, the data need to be read in and any categorical variables saved as factors. This is a repeat of the steps completed in Lab 8, but make sure to run the code below again in your own R Studio window before starting Lab 9.

yields <- read.csv(file = "yields.csv")

yields$fertiliser <- factor(x = yields$fertiliser, 
                            levels = c("Used", "Not used"))

yields$crop <- factor(x = yields$crop, 
                      levels = c("potato", "wheat"))

So far, analysis has been done using four subsets; potato_fertilised, potato_unfertilised, wheat_fertilised and wheat_unfertilised. We can create each of these again and have them saved in the Environment tab ready to use, using the following code.

potato_fertilised <- subset(x = yields,
                            subset = (fertiliser == "Used" & crop == "potato"))

potato_unfertilised <- subset(x = yields,
                              subset = (fertiliser == "Not used" & crop == "potato"))

wheat_fertilised <- subset(x = yields,
                           subset = (fertiliser == "Used" & crop == "wheat"))

wheat_unfertilised <- subset(x = yields,
                             subset = (fertiliser == "Not used" & crop == "wheat"))

Some of the key findings from an exploratory analysis of the data are that:

  • the sample mean yield of potatoes from fertilised and from unfertilised fields are approximately equal.

  • the sample standard deviation of potatoes from fertilised and from unfertilised fields are approximately equal.

  • the yields of potatoes from fertilised and unfertilised fields both approximately follow a normal distribution.

  • the sample mean yield of wheat from fertilised is greater than the sample mean yield of wheat from unfertilised fields.

  • the sample standard deviation of wheat yields from fertilised fields is greater than the sample standard deviation of wheat yields from unfertilised fields.

  • the yields of wheat from fertilised and unfertilised fields both approximately follow a normal distribution.